<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Chatterjee Correlation</title>
<link rel="stylesheet" href="../math.css" type="text/css">
<meta name="generator" content="DocBook XSL Stylesheets V1.79.1">
<link rel="home" href="../index.html" title="Math Toolkit 4.2.1">
<link rel="up" href="../statistics.html" title="Chapter 6. Statistics">
<link rel="prev" href="linear_regression.html" title="Linear Regression">
<link rel="next" href="../vector_functionals.html" title="Chapter 7. Vector Functionals - Norms">
<meta name="viewport" content="width=device-width, initial-scale=1">
</head>
<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
<table cellpadding="2" width="100%"><tr>
<td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../boost.png"></td>
<td align="center"><a href="../../../../../index.html">Home</a></td>
<td align="center"><a href="../../../../../libs/libraries.htm">Libraries</a></td>
<td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
<td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
<td align="center"><a href="../../../../../more/index.htm">More</a></td>
</tr></table>
<hr>
<div class="spirit-nav">
<a accesskey="p" href="linear_regression.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../statistics.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="../vector_functionals.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a>
</div>
<div class="section">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="math_toolkit.chatterjee_correlation"></a><a class="link" href="chatterjee_correlation.html" title="Chatterjee Correlation">Chatterjee Correlation</a>
</h2></div></div></div>
<h4>
<a name="math_toolkit.chatterjee_correlation.h0"></a>
      <span class="phrase"><a name="math_toolkit.chatterjee_correlation.synopsis"></a></span><a class="link" href="chatterjee_correlation.html#math_toolkit.chatterjee_correlation.synopsis">Synopsis</a>
    </h4>
<pre class="programlisting"><span class="preprocessor">#include</span> <span class="special">&lt;</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">math</span><span class="special">/</span><span class="identifier">statistics</span><span class="special">/</span><span class="identifier">chatterjee_correlation</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">&gt;</span>

<span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">math</span><span class="special">::</span><span class="identifier">statistics</span> <span class="special">{</span>

    <span class="identifier">C</span><span class="special">++</span><span class="number">17</span><span class="special">:</span>
    <span class="keyword">template</span> <span class="special">&lt;</span><span class="keyword">typename</span> <span class="identifier">ExecutionPolicy</span><span class="special">,</span> <span class="keyword">typename</span> <span class="identifier">Container</span><span class="special">&gt;</span>
    <span class="keyword">auto</span> <span class="identifier">chatterjee_correlation</span><span class="special">(</span><span class="identifier">ExecutionPolicy</span><span class="special">&amp;&amp;</span> <span class="identifier">exec</span><span class="special">,</span> <span class="keyword">const</span> <span class="identifier">Container</span><span class="special">&amp;</span> <span class="identifier">u</span><span class="special">,</span> <span class="keyword">const</span> <span class="identifier">Container</span><span class="special">&amp;</span> <span class="identifier">v</span><span class="special">);</span>

    <span class="identifier">C</span><span class="special">++</span><span class="number">11</span><span class="special">:</span>
    <span class="keyword">template</span> <span class="special">&lt;</span><span class="keyword">typename</span> <span class="identifier">Container</span><span class="special">&gt;</span>
    <span class="keyword">auto</span> <span class="identifier">chatterjee_correlation</span><span class="special">(</span><span class="keyword">const</span> <span class="identifier">Container</span><span class="special">&amp;</span> <span class="identifier">u</span><span class="special">,</span> <span class="keyword">const</span> <span class="identifier">Container</span><span class="special">&amp;</span> <span class="identifier">v</span><span class="special">);</span>
<span class="special">}</span>
</pre>
<h4>
<a name="math_toolkit.chatterjee_correlation.h1"></a>
      <span class="phrase"><a name="math_toolkit.chatterjee_correlation.description"></a></span><a class="link" href="chatterjee_correlation.html#math_toolkit.chatterjee_correlation.description">Description</a>
    </h4>
<p>
      The classical correlation coefficients like the Pearson's correlation are useful
      primarily for distinguishing when one dataset depends linearly on another.
      However, Pearson's correlation coefficient has a known weakness in that when
      the dependent variable has an obvious functional relationship with the independent
      variable, the value of the correlation coefficient can take on any value. As
      Chatterjee says:
    </p>
<p>
      &gt; Ideally, one would like a coefficient that approaches its maximum value
      if and only if one variable looks more and more like a noiseless function of
      the other, just as Pearson correlation is close to its maximum value if and
      only if one variable is close to being a noiseless linear function of the other.
    </p>
<p>
      This is the problem Chatterjee's coefficient solves. Let X and Y be random
      variables, where Y is not constant, and let (X_i, Y_i) be samples from this
      distribution. Rearrange these samples so that X_(0) &lt; X_{(1)} &lt; ... X_{(n-1)}
      and create (R(X_{(i)}), R(Y_{(i)})). The Chatterjee correlation is then given
      by
    </p>
<p>
      <span class="inlinemediaobject"><object type="image/svg+xml" data="../../equations/chatterjee_correlation.svg"></object></span>
    </p>
<p>
      In the limit of an infinite amount of i.i.d data, the statistic lies in [0,
      1]. However, if the data is not infinite, the statistic may be negative. If
      X and Y are independent, the value is zero, and if Y is a measurable function
      of X, then the statistic is unity. The complexity is O(n log n).
    </p>
<p>
      An example is given below:
    </p>
<pre class="programlisting"><span class="identifier">std</span><span class="special">::</span><span class="identifier">vector</span><span class="special">&lt;</span><span class="keyword">double</span><span class="special">&gt;</span> <span class="identifier">X</span><span class="special">{</span><span class="number">1</span><span class="special">,</span><span class="number">2</span><span class="special">,</span><span class="number">3</span><span class="special">,</span><span class="number">4</span><span class="special">,</span><span class="number">5</span><span class="special">};</span>
<span class="identifier">std</span><span class="special">::</span><span class="identifier">vector</span><span class="special">&lt;</span><span class="keyword">double</span><span class="special">&gt;</span> <span class="identifier">Y</span><span class="special">{</span><span class="number">1</span><span class="special">,</span><span class="number">2</span><span class="special">,</span><span class="number">3</span><span class="special">,</span><span class="number">4</span><span class="special">,</span><span class="number">5</span><span class="special">};</span>
<span class="keyword">using</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">math</span><span class="special">::</span><span class="identifier">statistics</span><span class="special">::</span><span class="identifier">chatterjee_correlation</span><span class="special">;</span>
<span class="keyword">double</span> <span class="identifier">coeff</span> <span class="special">=</span> <span class="identifier">chatterjee_correlation</span><span class="special">(</span><span class="identifier">X</span><span class="special">,</span> <span class="identifier">Y</span><span class="special">);</span>
</pre>
<p>
      The implementation follows <a href="https://arxiv.org/pdf/1909.10140.pdf" target="_top">Chatterjee's
      paper</a>.
    </p>
<p>
      <span class="emphasis"><em>Nota bene:</em></span> If the input is an integer type the output
      will be a double precision type.
    </p>
<h4>
<a name="math_toolkit.chatterjee_correlation.h2"></a>
      <span class="phrase"><a name="math_toolkit.chatterjee_correlation.invariants"></a></span><a class="link" href="chatterjee_correlation.html#math_toolkit.chatterjee_correlation.invariants">Invariants</a>
    </h4>
<p>
      The function expects at least two samples, a non-constant vector Y, and the
      same number of X's as Y's. If Y is constant, the result is a quiet NaN. The
      data set must be sorted by X values. If there are ties in the values of X,
      then the statistic is random due to the random breaking of ties. Of course,
      random numbers are not used internally, but the result is not guaranteed to
      be identical on different systems.
    </p>
<h4>
<a name="math_toolkit.chatterjee_correlation.h3"></a>
      <span class="phrase"><a name="math_toolkit.chatterjee_correlation.references"></a></span><a class="link" href="chatterjee_correlation.html#math_toolkit.chatterjee_correlation.references">References</a>
    </h4>
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem">
          Chatterjee, Sourav. "A new coefficient of correlation." Journal
          of the American Statistical Association 116.536 (2021): 2009-2022.
        </li></ul></div>
</div>
<div class="copyright-footer">Copyright © 2006-2021 Nikhar Agrawal, Anton Bikineev, Matthew Borland,
      Paul A. Bristow, Marco Guazzone, Christopher Kormanyos, Hubert Holin, Bruno
      Lalande, John Maddock, Evan Miller, Jeremy Murphy, Matthew Pulver, Johan Råde,
      Gautam Sewani, Benjamin Sobotta, Nicholas Thompson, Thijs van den Berg, Daryle
      Walker and Xiaogang Zhang<p>
        Distributed under the Boost Software License, Version 1.0. (See accompanying
        file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
      </p>
</div>
<hr>
<div class="spirit-nav">
<a accesskey="p" href="linear_regression.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../statistics.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="../vector_functionals.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a>
</div>
</body>
</html>
